Unsupervised Classification of Verb Noun Multi-Word Expression Tokens
نویسندگان
چکیده
We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges upon the assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is measured by contextual overlap. To this end, we set out to explore different contextual variations and different similarity measures. We also identify a new data set OPAQUE that comprises only non-decomposable VNC expressions. Our approach yields state of the art performance with an overall accuracy of 77.56% on a TEST data set and 81.66% on the newly characterized
منابع مشابه
A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions
A verb-noun Multi-Word Expression (MWE) is a combination of a verb and a noun with or without other words, in which the combination has a meaning different from the meaning of the words considered separately. In this paper, we present a new lexical resource of Hebrew Verb-Noun MWEs (VN-MWEs). The VN-MWEs of this resource were manually collected and annotated from five different web resources. I...
متن کاملComplex Predicates are Multi-Word Expressions
Practitioners of English Natural Language Processing often feel fortunate because their tokens are clearly marked by spaces on either side. However, the spaces can be quite deceptive, since they ignore the boundaries of multi-word expressions, such as noun-noun compounds, verb particle constructions, light verb constructions and constructions from Construction Grammar, e.g., caused-motion const...
متن کاملVerb Noun Construction MWE Token Classification
We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. We present a supervised learning approach to the problem. We experiment with different features. Our approach yields the best results to date on MWE c...
متن کاملHandling Sparsity for Verb Noun MWE Token Classification
We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges upon the assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is mea...
متن کاملA Nearly Unsupervised Learning Method for Automatic Paraphrasing of Japanese Noun Phrases
This paper describes a nealy unsupervised learning method for converting or paraphrasing Japanese noun phrases to verb phrases without changing their semantic contents. We focus on the paraphrasing of the noun phrases in the form of “A no B” where “A” and “B” are noun phrases and “no” is a postposition. Noun phrases in this form can express various relationships between “A” and “B.” In the past...
متن کامل